[Content Understanding] Update toLlmInput page markers and filter LLMStats telemetry by chienyuanchang · Pull Request #38851 · Azure/azure-sdk-for-js

chienyuanchang · 2026-06-05T23:51:58Z

Packages impacted by this PR

@azure/ai-content-understanding

Issues associated with this PR

Design proposal: https://github.com/cognitive-services/ContentUnderstanding-Docs/issues/249
Agent Framework feedback that prompted the LLMStats: filtering: Python: Adopt azure-ai-contentunderstanding to_llm_input in CU context provider microsoft/agent-framework#5796

Describe the problem that is addressed by this PR

The toLlmInput() helper renders Content Understanding AnalysisResult objects into LLM-friendly text. Two output-hygiene issues need to be addressed before the next CU service release:

The SDK emits page boundary markers as . The upcoming service release (per ContentUnderstanding-Docs#249) will emit the same boundary using . The SDK should adopt the new format and avoid emitting duplicate markers when the service-supplied markdown already contains them.
The service occasionally surfaces internal telemetry strings (e.g. LLMStats: completion calls: 2; embedding calls: 1; completion latency: 7.71s) in the warnings collection. These are not Responsible-AI warnings, and downstream consumers (Agent Framework, LangChain) currently strip them with local regex workarounds. The SDK should filter them at the source so the noise never reaches the LLM-facing rai_warnings block.

What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen?

This PR makes the smallest possible surface change inside toLlmInput():

Page marker constant + guard. Add an INPUT_PAGE_MARKER_PREFIX constant and a hasInputPageMarker() check at the top of addPageMarkers(). If the markdown already includes any <!-- InputPageNumber: substring (case-sensitive), pass the markdown through unchanged. Otherwise inject the new-format marker via the existing spans / PageBreak paths.
Telemetry filter on warnings. Add a TELEMETRY_MESSAGE_PREFIXES = ["LLMStats:"] list and a small isTelemetryMessage() predicate. Inside formatWarnings(), skip entries whose message (after trimming leading whitespace) starts with any prefix. Filtering is scoped to the structured warnings list only; the document markdown body is never inspected, so legitimate LLMStats: text in documents is preserved.

Alternative considered: post-rendering regex on the YAML output (the workaround currently used by Agent Framework). Rejected because operating on the structured list before rendering is simpler, more robust to YAML escaping, and idempotent.

Are there test cases added in this PR? (If not, why?)

Yes. Updated existing tests for the new marker format and added six new unit tests:

Duplicate-marker suppression when service markdown already contains markers.
LLMStats: warnings dropped while real warnings are kept.
rai_warnings block omitted entirely when only LLMStats: warnings exist.
Case-sensitive filter (lowercase llmstats: is preserved).
Markdown body containing literal LLMStats: text is preserved verbatim.
Leading-whitespace LLMStats: warnings are filtered.

All 37 unit tests in test/public/node/llmInputHelper.spec.ts pass locally.

Provide a list of related PRs (if any)

Companion PRs in sibling SDKs:

Command used to generate this PR:***(Applicable only to SDK release request PRs)*

Not applicable. This PR modifies hand-authored helper code; no regeneration was performed.

Checklists

Added impacted package name to the issue description
Does this PR needs any fixes in the SDK Generator? — No. Helper lives in src/static-helpers/llmInputHelper.ts (not generated).
Added a changelog (if necessary) — CHANGELOG.md updated under 1.2.0-beta.2 (Unreleased).

…Stats telemetry

[Content Understanding] Update toLlmInput page markers and filter LLM…

72ed27f

…Stats telemetry

github-actions Bot added the Cognitive - Content Understanding label Jun 5, 2026

Update remaining page marker references in JS sample and README

9a3a489

pitbull231980-dotcom approved these changes Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Content Understanding] Update toLlmInput page markers and filter LLMStats telemetry#38851

[Content Understanding] Update toLlmInput page markers and filter LLMStats telemetry#38851
chienyuanchang wants to merge 2 commits into
mainfrom
cu-sdk/llm-input-helper-update

chienyuanchang commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chienyuanchang commented Jun 5, 2026

Packages impacted by this PR

Issues associated with this PR

Describe the problem that is addressed by this PR

What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen?

Are there test cases added in this PR? (If not, why?)

Provide a list of related PRs (if any)

Command used to generate this PR:**(Applicable only to SDK release request PRs)

Checklists

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Command used to generate this PR:***(Applicable only to SDK release request PRs)*